import pandas as pd
import matplotlib.pyplot as plt
import nltkDate: August 2, 2023
Name: Chanleakhana Thon
Introduction
Description: The dataset I decided to look at is called “UFO Sightings” and includes over 80,000 records of UFO sightings around the world. This dataset includes the UFO’s shape, the location of the sighting including the coordinates/country/state, the duration of the sighting, and the time of the sighting.
Link: https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv
Motivation: My motivation for looking at this data is because I have an interest in conspiracy theories and extraterrestial beings and I would like to figure out whether there are patterns behind UFO sightings and what these patterns could mean.
Questions: * Is there a pattern among the most common times and location of UFO sightings? * Is there a common theme with the locations that have the most common UFO sightings? * Does the duration of the UFO sighting play an important role? * Do these patterns imply that these UFO sightings can be credible or debunked? * How have UFO sightings changed over time?
Methods
import requests
response=requests.get('https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv')
response<Response [200]>
Data Summary: The dataset includes the shape of the UFO, the location of the sighting (city and state), the duration of the encounter (in seconds), a short description of the sighting, coordinates of the sighting, and the time and date of the sighting (month, day, hour, minute, year), and the date that the sighting was documented. All of the data is numerical except for the description, shape, and location which is categorical.
url='https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv'
df=pd.read_csv(url)
df.head(5)| Location.City | Location.State | Location.Country | Data.Shape | Data.Encounter duration | Data.Description excerpt | Location.Coordinates.Latitude | Location.Coordinates.Longitude | Dates.Sighted.Year | Dates.Sighted.Month | Date.Sighted.Day | Dates.Sighted.Hour | Dates.Sighted.Minute | Dates.Documented.Year | Dates.Documented.Month | Dates.Documented.Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | anchor point | AK | US | disk | 300.0 | Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... | 59.776667 | -151.831389 | 2005 | 5 | 24 | 18 | 30 | 2005 | 5 | 28 |
| 1 | anchorage | AK | US | changing | 21600.0 | We could observe red lights dancing across the... | 61.218056 | -149.900278 | 2000 | 12 | 31 | 21 | 0 | 2001 | 2 | 18 |
| 2 | anchorage | AK | US | changing | 600.0 | INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... | 61.218056 | -149.900278 | 2006 | 10 | 23 | 21 | 3 | 2006 | 12 | 7 |
| 3 | anchorage | AK | US | cigar | 15.0 | I explained away the first time I thought I se... | 61.218056 | -149.900278 | 2014 | 3 | 29 | 20 | 45 | 2014 | 4 | 4 |
| 4 | anchorage | AK | US | circle | 300.0 | Orange circles "climbing" then fadin... | 61.218056 | -149.900278 | 2011 | 10 | 21 | 21 | 0 | 2011 | 10 | 25 |
Summary Statistics:
df.describe()| Data.Encounter duration | Location.Coordinates.Latitude | Location.Coordinates.Longitude | Dates.Sighted.Year | Dates.Sighted.Month | Date.Sighted.Day | Dates.Sighted.Hour | Dates.Sighted.Minute | Dates.Documented.Year | Dates.Documented.Month | Dates.Documented.Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 6.063200e+04 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 | 60632.000000 |
| mean | 5.410128e+03 | 38.311073 | -95.584796 | 2004.447833 | 6.872658 | 15.026587 | 15.809094 | 17.718367 | 2007.401537 | 6.706063 | 15.229219 |
| std | 4.143867e+05 | 5.552705 | 18.025296 | 10.178389 | 3.249002 | 8.920703 | 7.537834 | 17.924455 | 4.480640 | 3.487636 | 8.789173 |
| min | 1.000000e-02 | 19.426944 | -170.478889 | 1910.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1998.000000 | 1.000000 | 1.000000 |
| 25% | 3.000000e+01 | 34.092222 | -114.336667 | 2002.000000 | 4.000000 | 7.000000 | 11.000000 | 0.000000 | 2004.000000 | 4.000000 | 8.000000 |
| 50% | 1.800000e+02 | 38.904306 | -89.911111 | 2007.000000 | 7.000000 | 15.000000 | 19.000000 | 15.000000 | 2008.000000 | 7.000000 | 14.000000 |
| 75% | 6.000000e+02 | 41.924583 | -81.035000 | 2011.000000 | 10.000000 | 22.000000 | 21.000000 | 30.000000 | 2012.000000 | 10.000000 | 22.000000 |
| max | 6.627600e+07 | 70.636944 | -66.984722 | 2014.000000 | 12.000000 | 31.000000 | 23.000000 | 59.000000 | 2014.000000 | 12.000000 | 31.000000 |
df.shape(60632, 16)
Outlier Data: There don’t seem to be many outliers because the date and time data fit into what the min and max amount of months, years, days, and hours in the day there should be. However, there seems to be an outlier with the max encounter duration which is quite larger than the mean duration and possibly an unrealistic number because 6.627600e+07 seconds is over 600 days.
Data Preprocessing: I decided to only look at data in which the encounter duration was less than a day because any encounters longer than that could be a typo or it could mean multiple sightings over multiple days. Therefore, I dropped any of the rows that included an encounter duration of greater than 86,400 seconds.
# check column names
print(df.columns)Index(['Location.City', 'Location.State', 'Location.Country', 'Data.Shape',
'Data.Encounter duration', 'Data.Description excerpt',
'Location.Coordinates.Latitude ', 'Location.Coordinates.Longitude ',
'Dates.Sighted.Year', 'Dates.Sighted.Month', 'Date.Sighted.Day',
'Dates.Sighted.Hour', 'Dates.Sighted.Minute', 'Dates.Documented.Year',
'Dates.Documented.Month', 'Dates.Documented.Day'],
dtype='object')
# drop any duplicate rows
df.drop_duplicates()| Location.City | Location.State | Location.Country | Data.Shape | Data.Encounter duration | Data.Description excerpt | Location.Coordinates.Latitude | Location.Coordinates.Longitude | Dates.Sighted.Year | Dates.Sighted.Month | Date.Sighted.Day | Dates.Sighted.Hour | Dates.Sighted.Minute | Dates.Documented.Year | Dates.Documented.Month | Dates.Documented.Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | anchor point | AK | US | disk | 300.0 | Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... | 59.776667 | -151.831389 | 2005 | 5 | 24 | 18 | 30 | 2005 | 5 | 28 |
| 1 | anchorage | AK | US | changing | 21600.0 | We could observe red lights dancing across the... | 61.218056 | -149.900278 | 2000 | 12 | 31 | 21 | 0 | 2001 | 2 | 18 |
| 2 | anchorage | AK | US | changing | 600.0 | INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... | 61.218056 | -149.900278 | 2006 | 10 | 23 | 21 | 3 | 2006 | 12 | 7 |
| 3 | anchorage | AK | US | cigar | 15.0 | I explained away the first time I thought I se... | 61.218056 | -149.900278 | 2014 | 3 | 29 | 20 | 45 | 2014 | 4 | 4 |
| 4 | anchorage | AK | US | circle | 300.0 | Orange circles "climbing" then fadin... | 61.218056 | -149.900278 | 2011 | 10 | 21 | 21 | 0 | 2011 | 10 | 25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 60627 | sheridan | WY | US | oval | 20.0 | blue-green bright oval was spotted 20 miles so... | 44.797222 | -106.955556 | 2002 | 9 | 6 | 21 | 0 | 2002 | 9 | 13 |
| 60628 | thermopolis | WY | US | unknown | 15.0 | UFO near Thermopolis WY | 43.646111 | -108.211389 | 2007 | 6 | 14 | 23 | 0 | 2007 | 8 | 7 |
| 60629 | torrington | WY | US | cigar | 2.0 | I was on a hill enjoying the sunset. I fell as... | 42.065000 | -104.181111 | 2011 | 11 | 5 | 21 | 30 | 2011 | 12 | 12 |
| 60630 | worland | WY | US | light | 15.0 | The object was a dim point of light that grew ... | 44.016944 | -107.954722 | 2003 | 6 | 17 | 22 | 42 | 2003 | 6 | 18 |
| 60631 | worland | WY | US | oval | 2700.0 | ((HOAX??)) My parents told me they saw this U... | 44.016944 | -107.954722 | 2008 | 2 | 15 | 5 | 0 | 2008 | 4 | 17 |
60630 rows × 16 columns
# remove outlier durations longer than a day (86400 seconds)
df.drop(df[df['Data.Encounter duration'] >= 86400].index, inplace = True)
df| Location.City | Location.State | Location.Country | Data.Shape | Data.Encounter duration | Data.Description excerpt | Location.Coordinates.Latitude | Location.Coordinates.Longitude | Dates.Sighted.Year | Dates.Sighted.Month | Date.Sighted.Day | Dates.Sighted.Hour | Dates.Sighted.Minute | Dates.Documented.Year | Dates.Documented.Month | Dates.Documented.Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | anchor point | AK | US | disk | 300.0 | Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... | 59.776667 | -151.831389 | 2005 | 5 | 24 | 18 | 30 | 2005 | 5 | 28 |
| 1 | anchorage | AK | US | changing | 21600.0 | We could observe red lights dancing across the... | 61.218056 | -149.900278 | 2000 | 12 | 31 | 21 | 0 | 2001 | 2 | 18 |
| 2 | anchorage | AK | US | changing | 600.0 | INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... | 61.218056 | -149.900278 | 2006 | 10 | 23 | 21 | 3 | 2006 | 12 | 7 |
| 3 | anchorage | AK | US | cigar | 15.0 | I explained away the first time I thought I se... | 61.218056 | -149.900278 | 2014 | 3 | 29 | 20 | 45 | 2014 | 4 | 4 |
| 4 | anchorage | AK | US | circle | 300.0 | Orange circles "climbing" then fadin... | 61.218056 | -149.900278 | 2011 | 10 | 21 | 21 | 0 | 2011 | 10 | 25 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 60627 | sheridan | WY | US | oval | 20.0 | blue-green bright oval was spotted 20 miles so... | 44.797222 | -106.955556 | 2002 | 9 | 6 | 21 | 0 | 2002 | 9 | 13 |
| 60628 | thermopolis | WY | US | unknown | 15.0 | UFO near Thermopolis WY | 43.646111 | -108.211389 | 2007 | 6 | 14 | 23 | 0 | 2007 | 8 | 7 |
| 60629 | torrington | WY | US | cigar | 2.0 | I was on a hill enjoying the sunset. I fell as... | 42.065000 | -104.181111 | 2011 | 11 | 5 | 21 | 30 | 2011 | 12 | 12 |
| 60630 | worland | WY | US | light | 15.0 | The object was a dim point of light that grew ... | 44.016944 | -107.954722 | 2003 | 6 | 17 | 22 | 42 | 2003 | 6 | 18 |
| 60631 | worland | WY | US | oval | 2700.0 | ((HOAX??)) My parents told me they saw this U... | 44.016944 | -107.954722 | 2008 | 2 | 15 | 5 | 0 | 2008 | 4 | 17 |
60501 rows × 16 columns
# check if the largest encounter durations are less than 86400 seconds
df.nlargest(5, "Data.Encounter duration")| Location.City | Location.State | Location.Country | Data.Shape | Data.Encounter duration | Data.Description excerpt | Location.Coordinates.Latitude | Location.Coordinates.Longitude | Dates.Sighted.Year | Dates.Sighted.Month | Date.Sighted.Day | Dates.Sighted.Hour | Dates.Sighted.Minute | Dates.Documented.Year | Dates.Documented.Month | Dates.Documented.Day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 345 | bessemer | AL | US | unknown | 73800.0 | 10/26/2011 To whom it may concern On or about ... | 33.401667 | -86.954444 | 1987 | 2 | 20 | 1 | 30 | 2011 | 12 | 12 |
| 2651 | phoenix | AZ | US | light | 73800.0 | Long Streaks of Light (that put me in mind of ... | 33.448333 | -112.073333 | 1998 | 2 | 25 | 3 | 0 | 1999 | 1 | 28 |
| 2921 | prescott valley | AZ | US | other | 73800.0 | 3 then up to 6 white flashing lights move erra... | 34.610000 | -112.315000 | 2013 | 10 | 22 | 18 | 30 | 2013 | 11 | 11 |
| 9867 | san fernando | CA | US | triangle | 73800.0 | watch many ligth's in the sky day's befo... | 34.281944 | -118.438056 | 1998 | 8 | 15 | 17 | 30 | 2001 | 1 | 3 |
| 15522 | jacksonville | FL | US | triangle | 73800.0 | SILENT TRIANGLE SPARKLING LIGHTS INVISBLE CENTER. | 30.331944 | -81.655833 | 2010 | 5 | 6 | 20 | 0 | 2010 | 5 | 12 |
Results
Exploratory Data Visualisations
plt.hist(x=df['Dates.Sighted.Hour'], bins=24)
plt.title('Most Common Hour of UFO Sightings')Text(0.5, 1.0, 'Most Common Hour of UFO Sightings')

First Exploratory Visualization: The first exploratory visualization I decided to create was the most common hour of UFO sightings. It appears that most UFO sightings peak around 9PM. The least amount of UFO sightings appear around 8AM. The distribution seems to increase when it gets darker and decrease when there is more sunlight.
import plotly.graph_objects as go
fig = go.Figure(data=go.Scattergeo(
lon = df['Location.Coordinates.Longitude '],
lat = df['Location.Coordinates.Latitude '],
mode = 'markers'
))
fig.update_layout(
title = 'Coordinations of UFO Sightings',
geo_scope='usa',
)
fig.show()Second Exploratory Visualizations: The second exploratory data shows the locations of all of the coordinates of all of the UFO sightings over all of the years in the United States. The map shows that most of the UFO sightings recorded are on the east half of the United States and the West coast. The amount of UFO sightings is a lot less dense in the area between.
city=df['Location.City']
top15city=nltk.FreqDist(city).most_common(15)
top15city
for x,y in top15city:
plt.barh(x,y)
plt.title("Top 15 Cities with UFO Sightings")Text(0.5, 1.0, 'Top 15 Cities with UFO Sightings')

Third Exploratory Visualisation: The third exploratory visualization is a horizontal bar graph showing the top 15 cities based on amount of UFO sightings. The top 5 are big cities on the west coast of the United States which is interesting because in the previous map the dense parts of UFO sightings were mostly on the east half of the United States but these individual cities on the west coast hold the greatest number of UFO sightings.
shape=df['Data.Shape']
top15shapes=nltk.FreqDist(shape).most_common(15)
top15shapes
for x,y in top15shapes:
plt.barh(x,y)
plt.title("Top 15 UFO Shape Sightings")Text(0.5, 1.0, 'Top 15 UFO Shape Sightings')

Fourth Exploratory Visualisation: The fourth exploratory visualisation is also a horizontal bar graph showing the top 15 shapes of UFO’s in UFO sightings. Sightings of the ‘light’ shape is far greater than any of the other shapes having over 12,000 sightings while the other shapes have less than 6000 sightings.
Data Visualisations
# Analyze most common UFO shapes in one of the top 5 cities
# Analyze most common UFO shape in one of the cities with less sightings
# See if theres a correlation in UFO sightings over timeyear=df['Dates.Documented.Year']
sightingsperyear=nltk.FreqDist(year)
sightingsperyear
for i in sightingsperyear:
plt.scatter(i, sightingsperyear[i])
plt.title("UFO Sightings per Year")Text(0.5, 1.0, 'UFO Sightings per Year')

First Visualization: The first visualization is a scatter plot showing the amount of UFO sightings per year. Looking at the plot, it seems that the amount of UFO sightings generally increases every year with the max amount of sightings in 2012 with over 6000 sightings and then decreasing again in 2014 which could be because there wasn’t a complete amount of data to account for the entire year of 2014.
# frequency of UFO encounter durations over the years
plt.hist(x=df['Data.Encounter duration'], bins=1000)
plt.xlim([0, 4000])
plt.title('Frequency of Encounter Duration of UFO Sightings')Text(0.5, 1.0, 'Frequency of Encounter Duration of UFO Sightings')

Second Visualization: The second visualization is a histogram that shows which encounter duration frequency is the most common. The most frequent encounter duration is about 27,000 seconds which is 7 hours. The long encounter could mean multiple separate encounters over the course of 7 hours. The second most frequent encounter duration is much smaller though at around 7000 seconds which is about 2 hours.
seattle = df.loc[df['Location.City'] == "seattle"]
seattleshape=seattle['Data.Shape']
seattlecommonshapes=nltk.FreqDist(seattleshape).most_common(9)
lasvegas = df.loc[df['Location.City'] == "las vegas"]
lasvegasshape=lasvegas['Data.Shape']
lasvegascommonshapes=nltk.FreqDist(lasvegasshape).most_common(9)
la = df.loc[df['Location.City'] == "los angeles"]
lashape=la['Data.Shape']
lacommonshapes=nltk.FreqDist(lashape).most_common(9)
phoenix = df.loc[df['Location.City'] == "phoenix"]
phoenixshape=phoenix['Data.Shape']
phoenixcommonshapes=nltk.FreqDist(phoenixshape).most_common(9)fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4, ncols=1,gridspec_kw={'hspace': 0.5},figsize=(9, 12))
for shape, count in lasvegascommonshapes:
ax1.barh(shape, count)
ax1.set_title('Common Shapes in Las Vegas')
ax1.set_xlabel('Frequency')
ax1.set_ylabel('Shape')
for shape, count in seattlecommonshapes:
ax2.barh(shape, count)
ax2.set_title('Common Shapes in Seattle')
ax2.set_xlabel('Frequency')
ax2.set_ylabel('Shape')
for shape, count in lacommonshapes:
ax3.barh(shape, count)
ax3.set_title('Common Shapes in Los Angeles')
ax3.set_xlabel('Frequency')
ax3.set_ylabel('Shape')
for shape, count in phoenixcommonshapes:
ax4.barh(shape, count)
ax4.set_title('Common Shapes in Phoenix')
ax4.set_xlabel('Frequency')
ax4.set_ylabel('Shape')
Text(0, 0.5, 'Shape')

Third Visualization: The third visualization looks closer at the top cities with the most common UFO sightings to see what their most common UFO shapes were. In this visalization, we look at Las Vegas, Los Angeles, Phoenix, and Seattle. We can see that forth all of the plots, the most common UFO shape that was sighted was the “light” shape which exceeds the other shapes by far.
bottom15city = nltk.FreqDist(city).most_common()[-8000:]
bottom15city[('dixon', 11),
('felton', 11),
('glendora', 11),
('grapevine', 11),
('healdsburg', 11),
('hillsborough', 11),
('huron', 11),
('laguna beach', 11),
('manteca', 11),
('mojave', 11),
('orangevale', 11),
('paso robles', 11),
('placentia', 11),
('rancho santa margarita', 11),
('red bluff', 11),
('san ramon', 11),
('three rivers', 11),
('weaverville', 11),
('delta', 11),
('victor', 11),
('branford', 11),
('darien', 11),
('derby', 11),
('hamden', 11),
('bear', 11),
('bonita springs', 11),
('brooksville', 11),
('crestview', 11),
('inverness', 11),
('okeechobee', 11),
('pinellas park', 11),
('royal palm beach', 11),
('winter springs', 11),
('mcdonough', 11),
('ringgold', 11),
('temple', 11),
('thomasville', 11),
('hilo', 11),
('lahaina', 11),
('dubuque', 11),
('ottumwa', 11),
('middleton', 11),
('alton', 11),
('champaign', 11),
('crestwood', 11),
('deerfield', 11),
('gurnee', 11),
('lemont', 11),
('normal', 11),
('ottawa', 11),
('riverton', 11),
('schaumburg', 11),
('south elgin', 11),
('tuscola', 11),
('urbana', 11),
('fishers', 11),
('michigan city', 11),
('vincennes', 11),
('cameron', 11),
('denham springs', 11),
('attleboro', 11),
('fitchburg', 11),
('methuen', 11),
('northampton', 11),
('norwood', 11),
('revere', 11),
('rockport', 11),
('shrewsbury', 11),
('williamstown', 11),
('ellicott city', 11),
('gorham', 11),
('scarborough', 11),
('new brighton', 11),
('jefferson city', 11),
('kearney', 11),
('willard', 11),
('williamsville', 11),
('carolina beach', 11),
('chapel hill', 11),
('hampstead', 11),
('kernersville', 11),
('kill devil hills', 11),
('kure beach', 11),
('smithfield', 11),
('wrightsville beach', 11),
('clark', 11),
('hopewell', 11),
('sicklerville', 11),
('commack', 11),
('niagara falls', 11),
('beavercreek', 11),
('delaware', 11),
('scappoose', 11),
('indiana', 11),
('phoenixville', 11),
('stroudsburg', 11),
('spartanburg', 11),
('goodlettsville', 11),
('greeneville', 11),
('alvin', 11),
('laredo', 11),
('new braunfels', 11),
('lehi', 11),
('moab', 11),
('tooele', 11),
('east wenatchee', 11),
('poulsbo', 11),
('sammamish', 11),
('wisconsin dells', 11),
('gillette', 11),
('eagle river', 10),
('addison', 10),
('ozark', 10),
('junction city', 10),
('sherwood', 10),
('douglas', 10),
('duncan', 10),
('fountain hills', 10),
('payson', 10),
('boonville', 10),
('canby', 10),
('cathedral city', 10),
('lemoore', 10),
('lompoc', 10),
('los alamos', 10),
('madera', 10),
('newbury park', 10),
('northridge', 10),
('oak park', 10),
('paradise', 10),
('ramona', 10),
('salida', 10),
('san jacinto', 10),
('susanville', 10),
('tehachapi', 10),
('westwood', 10),
('breckenridge', 10),
('norwich', 10),
('harrington', 10),
('cocoa beach', 10),
('holiday', 10),
('jensen beach', 10),
('lantana', 10),
('melbourne beach', 10),
('tarpon springs', 10),
('acworth', 10),
('douglasville', 10),
('maysville', 10),
('kaneohe', 10),
('muscatine', 10),
('west des moines', 10),
('bourbonnais', 10),
('crete', 10),
('loves park', 10),
('martinsville', 10),
('spring grove', 10),
('texas city', 10),
('waukegan', 10),
('willow springs', 10),
('albion', 10),
('bennington', 10),
('brookston', 10),
('connersville', 10),
('corydon', 10),
('logansport', 10),
('warsaw', 10),
('beloit', 10),
('louisburg', 10),
('falmouth', 10),
('hopkinsville', 10),
('perryville', 10),
('sparta', 10),
('verona', 10),
('holden', 10),
('metairie', 10),
('beverly', 10),
('haverhill', 10),
('holyoke', 10),
('randolph', 10),
('southampton', 10),
('wayland', 10),
('holly', 10),
('lapeer', 10),
('madison heights', 10),
('novi', 10),
('chaska', 10),
('bolivar', 10),
('lumberton', 10),
('meadville', 10),
('plains', 10),
('new bern', 10),
('elkhorn', 10),
('hershey', 10),
('papillion', 10),
('cape may', 10),
('paterson', 10),
('piscataway', 10),
('pleasantville', 10),
('ashville', 10),
('bethpage', 10),
('north tonawanda', 10),
('plainview', 10),
('ronkonkoma', 10),
('vestal', 10),
('euclid', 10),
('lorain', 10),
('claremore', 10),
('yukon', 10),
('tualatin', 10),
('enola', 10),
('gettysburg', 10),
('aiken', 10),
('fort mill', 10),
('yankton', 10),
('copperas cove', 10),
('nacogdoches', 10),
('pearland', 10),
('the woodlands', 10),
('waxahachie', 10),
('layton', 10),
('front royal', 10),
('camas', 10),
('lynden', 10),
('port townsend', 10),
('snoqualmie', 10),
('university place', 10),
('lake geneva', 10),
('manitowoc', 10),
('neenah', 10),
('new berlin', 10),
('river falls', 10),
('cordova', 9),
('bradford', 9),
('chelsea', 9),
('cullman', 9),
('oneonta', 9),
('alma', 9),
('bentonville', 9),
('damascus', 9),
('osceola', 9),
('pocahontas', 9),
('benson', 9),
('globe', 9),
('holbrook', 9),
('sun city', 9),
('vail', 9),
('acton', 9),
('calabasas', 9),
('chino hills', 9),
('encino', 9),
('hermosa beach', 9),
('imperial beach', 9),
('la mirada', 9),
('la puente', 9),
('manhattan beach', 9),
('mariposa', 9),
('menifee', 9),
('mill valley', 9),
('perris', 9),
('san juan capistrano', 9),
('south gate', 9),
('south lake tahoe', 9),
('stanton', 9),
('temple city', 9),
('truckee', 9),
('watsonville', 9),
('west los angeles', 9),
('wildomar', 9),
('willits', 9),
('durango', 9),
('oak creek', 9),
('brooklyn', 9),
('cromwell', 9),
('new britain', 9),
('new milford', 9),
('southbury', 9),
('southport', 9),
('west hartford', 9),
('westport', 9),
('wethersfield', 9),
('altamonte springs', 9),
('baldwin', 9),
('dade city', 9),
('dunedin', 9),
('fort pierce', 9),
('hernando', 9),
('hobe sound', 9),
('marathon', 9),
('new smyrna beach', 9),
('palm city', 9),
('saint cloud', 9),
('sebring', 9),
('valrico', 9),
('wesley chapel', 9),
('wildwood', 9),
('byron', 9),
('dalton', 9),
('dawsonville', 9),
('flowery branch', 9),
('lincolnton', 9),
('mitchell', 9),
('suwanee', 9),
('carlisle', 9),
('clear lake', 9),
('urbandale', 9),
('moscow', 9),
('girard', 9),
('lake in the hills', 9),
('leland', 9),
('lewistown', 9),
('murphysboro', 9),
('park ridge', 9),
('pontiac', 9),
('streamwood', 9),
('wadsworth', 9),
('wilsonville', 9),
('winfield', 9),
('columbia city', 9),
('gary', 9),
('yorktown', 9),
('park city', 9),
('phillipsburg', 9),
('corinth', 9),
('pineville', 9),
('russell springs', 9),
('gonzales', 9),
('kenner', 9),
('chicopee', 9),
('fall river', 9),
('hopkinton', 9),
('marlboro', 9),
('north attleboro', 9),
('provincetown', 9),
('winthrop', 9),
('edgewood', 9),
('glen burnie', 9),
('nottingham', 9),
('parkville', 9),
('towson', 9),
('palmyra', 9),
('poland', 9),
('adrian', 9),
('dearborn heights', 9),
('flushing', 9),
('royal oak', 9),
('southfield', 9),
('brainerd', 9),
('coon rapids', 9),
('eden prairie', 9),
('ely', 9),
('minnetonka', 9),
('shoreview', 9),
('neosho', 9),
('nevada', 9),
('biloxi', 9),
('durant', 9),
('ripley', 9),
('apex', 9),
('cornelius', 9),
('garner', 9),
('huntersville', 9),
('waxhaw', 9),
('grand forks', 9),
('brady', 9),
('bayonne', 9),
('beachwood', 9),
('cherry hill', 9),
('new brunswick', 9),
('point pleasant', 9),
('alamogordo', 9),
('fallon', 9),
('dundee', 9),
('fairport', 9),
('liverpool', 9),
('riverhead', 9),
('boardman', 9),
('cuyahoga falls', 9),
('findlay', 9),
('kettering', 9),
('powell', 9),
('strongsville', 9),
('wooster', 9),
('bartlesville', 9),
('ponca city', 9),
('poteau', 9),
('weatherford', 9),
('baker city', 9),
('keizer', 9),
('chambersburg', 9),
('langhorne', 9),
('lansdale', 9),
('hixson', 9),
('sevierville', 9),
('bertram', 9),
('cedar park', 9),
('humble', 9),
('keller', 9),
('league city', 9),
('stephenville', 9),
('logan', 9),
('magna', 9),
('vernal', 9),
('burke', 9),
('hoquiam', 9),
('mercer island', 9),
('orting', 9),
('selah', 9),
('sheboygan', 9),
('beckley', 9),
('homer', 8),
('kenai', 8),
('crossville', 8),
('eldridge', 8),
('enterprise', 8),
('foley', 8),
('gadsden', 8),
('new hope', 8),
('selma', 8),
('arden', 8),
('glenwood', 8),
('pottsville', 8),
('san carlos', 8),
('tonopah', 8),
('aliso viejo', 8),
('antelope', 8),
('benicia', 8),
('bishop', 8),
('blythe', 8),
('canyon country', 8),
('ceres', 8),
('commerce', 8),
('compton', 8),
('corning', 8),
('desert hot springs', 8),
('diamond bar', 8),
('el cerrito', 8),
('la habra', 8),
('la quinta', 8),
('live oak', 8),
('los gatos', 8),
('orinda', 8),
('penn valley', 8),
('san dimas', 8),
('south pasadena', 8),
('south san francisco', 8),
('sun valley', 8),
('trinidad', 8),
('valley springs', 8),
('eagle', 8),
('evergreen', 8),
('morrison', 8),
('rifle', 8),
('dayville', 8),
('guilford', 8),
('marlborough', 8),
('southington', 8),
('vernon', 8),
('atlantic beach', 8),
('aventura', 8),
('key largo', 8),
('loxahatchee', 8),
('margate', 8),
('oakland park', 8),
('orange park', 8),
('polk city', 8),
('sebastian', 8),
('bloomingdale', 8),
('cairo', 8),
('gray', 8),
('grayson', 8),
('lithia springs', 8),
('morrow', 8),
('newnan', 8),
('reidsville', 8),
('stone mountain', 8),
('sylvania', 8),
('warner robins', 8),
('whitesburg', 8),
('keaau', 8),
('ainsworth', 8),
('dorchester', 8),
('fort dodge', 8),
('granville', 8),
('woodward', 8),
('montpelier', 8),
('pocatello', 8),
('des plaines', 8),
('granite city', 8),
('hinsdale', 8),
('hoffman estates', 8),
('ingleside', 8),
('new lenox', 8),
('olney', 8),
('oregon', 8),
('vandalia', 8),
('westmont', 8),
('angola', 8),
('charlestown', 8),
('lagrange', 8),
('lawrenceburg', 8),
('pendleton', 8),
('tipton', 8),
('marquette', 8),
('mission', 8),
('bardstown', 8),
('london', 8),
('louisa', 8),
('nicholasville', 8),
('scottsville', 8),
('agawam', 8),
('berkley', 8),
('framingham', 8),
('marstons mills', 8),
('stoughton', 8),
('friendship', 8),
('north east', 8),
('owings mills', 8),
('belgrade', 8),
('farmingdale', 8),
('otis', 8),
('raymond', 8),
('wells', 8),
('burton', 8),
('carson city', 8),
('east lansing', 8),
('fostoria', 8),
('grand blanc', 8),
('linden', 8),
('edina', 8),
('fairmont', 8),
('mankato', 8),
('maple grove', 8),
('mcgregor', 8),
('richfield', 8),
('branson', 8),
('ironton', 8),
('osage beach', 8),
('raymore', 8),
('brookhaven', 8),
('ocean springs', 8),
('wolf point', 8),
('goldsboro', 8),
('highlands', 8),
('mebane', 8),
('keene', 8),
('bergenfield', 8),
('east brunswick', 8),
('mahwah', 8),
('parsippany', 8),
('sayreville', 8),
('sweetwater', 8),
('hobbs', 8),
('socorro', 8),
('babylon', 8),
('cheektowaga', 8),
('endicott', 8),
('hicksville', 8),
('hyde park', 8),
('jamaica', 8),
('pine bush', 8),
('white plains', 8),
('christiansburg', 8),
('maumee', 8),
('south point', 8),
('west manchester', 8),
('enid', 8),
('clackamas', 8),
('hermiston', 8),
('hood river', 8),
('blacksburg', 8),
('easley', 8),
('little river', 8),
('dandridge', 8),
('dyersburg', 8),
('cleburne', 8),
('del rio', 8),
('hurst', 8),
('the colony', 8),
('bountiful', 8),
('green river', 8),
('roy', 8),
('luray', 8),
('spotsylvania', 8),
('suffolk', 8),
('chehalis', 8),
('duvall', 8),
('friday harbor', 8),
('gold bar', 8),
('pullman', 8),
('baraboo', 8),
('cedarburg', 8),
('plover', 8),
('sturgeon bay', 8),
('wautoma', 8),
('ketchikan', 7),
('seward', 7),
('anniston', 7),
('gardendale', 7),
('orange beach', 7),
('piedmont', 7),
('cabot', 7),
('garfield', 7),
('arizona city', 7),
('camp verde', 7),
('paradise valley', 7),
('anaheim hills', 7),
('banning', 7),
('big sur', 7),
('borrego springs', 7),
('burlingame', 7),
('cadiz', 7),
('cupertino', 7),
('daly city', 7),
('del mar', 7),
('desert center', 7),
('el monte', 7),
('el segundo', 7),
('foster city', 7),
('gardena', 7),
('grenada', 7),
('grover beach', 7),
('hanford', 7),
('joshua tree', 7),
('la crescenta', 7),
('lake arrowhead', 7),
('nipomo', 7),
('orange county', 7),
('panorama city', 7),
('paramount', 7),
('rancho cordova', 7),
('rosemead', 7),
('san anselmo', 7),
('san bruno', 7),
('san fernando', 7),
('sylmar', 7),
('west hollywood', 7),
('west sacramento', 7),
('yucca valley', 7),
('bayfield', 7),
('commerce city', 7),
('dillon', 7),
('nederland', 7),
('platteville', 7),
('sedalia', 7),
('strasburg', 7),
('east haven', 7),
('hebron', 7),
('new london', 7),
('oakville', 7),
('prospect', 7),
('westville', 7),
('anthony', 7),
('casselberry', 7),
('crystal beach', 7),
('kendall', 7),
('north miami', 7),
('north miami beach', 7),
('oldsmar', 7),
('parkland', 7),
('parrish', 7),
('port orange', 7),
('rockledge', 7),
('tamarac', 7),
('college park', 7),
('dahlonega', 7),
('evans', 7),
('midway', 7),
('moultrie', 7),
('norcross', 7),
('oakwood', 7),
('st. simons island', 7),
('sycamore', 7),
('villa rica', 7),
('kahului', 7),
('kapaa', 7),
('mason city', 7),
('solon', 7),
('blackfoot', 7),
('emmett', 7),
('midvale', 7),
('rexburg', 7),
('algonquin', 7),
('chicago heights', 7),
('dekalb', 7),
('edinburg', 7),
('hanover park', 7),
('harvard', 7),
('lake villa', 7),
('manteno', 7),
('mundelein', 7),
('seneca', 7),
('woodridge', 7),
('brazil', 7),
('wabash', 7),
('walton', 7),
('west lafayette', 7),
('zionsville', 7),
('emporia', 7),
('norton', 7),
('mayfield', 7),
('millersburg', 7),
('pikeville', 7),
('blanchard', 7),
('houma', 7),
('brookline', 7),
('malden', 7),
('shirley', 7),
('south yarmouth', 7),
('southbridge', 7),
('waltham', 7),
('westford', 7),
('weymouth', 7),
('dundalk', 7),
('fort washington', 7),
('severn', 7),
('sykesville', 7),
('ellsworth', 7),
('presque isle', 7),
('attica', 7),
('davison', 7),
('flat rock', 7),
('gaylord', 7),
('kentwood', 7),
('south lyon', 7),
('sturgis', 7),
('taylor', 7),
('crosby', 7),
('inver grove heights', 7),
('little falls', 7),
('cuba', 7),
('starkville', 7),
('conover', 7),
('havelock', 7),
('statesville', 7),
('surf city', 7),
('hooksett', 7),
('windham', 7),
('asbury park', 7),
('keansburg', 7),
('marlton', 7),
('perth amboy', 7),
('ridgewood', 7),
('sewell', 7),
('gallup', 7),
('tularosa', 7),
('winnemucca', 7),
('bayside', 7),
('coram', 7),
('cortland', 7),
('forest hills', 7),
('herkimer', 7),
('hudson falls', 7),
('jericho', 7),
('kings park', 7),
('lake george', 7),
('massapequa', 7),
('montauk', 7),
('peekskill', 7),
('perrysburg', 7),
('port washington', 7),
('sag harbor', 7),
('saratoga springs', 7),
('selden', 7),
('smithtown', 7),
('sunnyside', 7),
('amelia', 7),
('barberton', 7),
('brook park', 7),
('celina', 7),
('kilgore', 7),
('miamisburg', 7),
('north royalton', 7),
('peebles', 7),
('piqua', 7),
('port clinton', 7),
('reynoldsburg', 7),
('sunbury', 7),
('wapakoneta', 7),
('el reno', 7),
('muskogee', 7),
('sapulpa', 7),
('christmas valley', 7),
('rogue river', 7),
('bensalem', 7),
('bethel park', 7),
('conshohocken', 7),
('edinboro', 7),
('fairless hills', 7),
('norristown', 7),
('philipsburg', 7),
('yardley', 7),
('narragansett', 7),
('west warwick', 7),
('woonsocket', 7),
('ladson', 7),
('north charleston', 7),
('taylors', 7),
('travelers rest', 7),
('eagle butte', 7),
('elizabethton', 7),
('pigeon forge', 7),
('tullahoma', 7),
('alice', 7),
('brownwood', 7),
('euless', 7),
('palestine', 7),
('southlake', 7),
('draper', 7),
('kearns', 7),
('spanish fork', 7),
('culpeper', 7),
('reston', 7),
('staunton', 7),
('langley', 7),
('sultan', 7),
('west seattle', 7),
('menomonie', 7),
('oconomowoc', 7),
('tomah', 7),
('moundsville', 7),
('petersburg', 6),
('adamsville', 6),
('alabaster', 6),
('bremen', 6),
('daphne', 6),
('fort morgan', 6),
('springville', 6),
('bryant', 6),
('elkins', 6),
('glen rose', 6),
('malvern', 6),
('searcy', 6),
('van buren', 6),
('chino valley', 6),
('jerome', 6),
('kearny', 6),
('agoura hills', 6),
('alamo', 6),
('atascadero', 6),
('calistoga', 6),
('canyon', 6),
('cedarville', 6),
('cerritos', 6),
('clearlake', 6),
('collegeville', 6),
('coloma', 6),
('duarte', 6),
('granada hills', 6),
('half moon bay', 6),
('inglewood', 6),
('lone pine', 6),
('ludlow', 6),
('mammoth lakes', 6),
('marina del rey', 6),
('monterey park', 6),
('moorpark', 6),
('mount shasta', 6),
('nevada city', 6),
('north highlands', 6),
('pacifica', 6),
('pollock pines', 6),
('rancho mirage', 6),
('rosamond', 6),
('santa paula', 6),
('shasta lake', 6),
('vinton', 6),
('westlake village', 6),
('wilton', 6),
('yreka', 6),
('bailey', 6),
('centennial', 6),
('cortez', 6),
('estes park', 6),
('falcon', 6),
('northglenn', 6),
('ovid', 6),
('steamboat springs', 6),
('wheat ridge', 6),
('cheshire', 6),
('glastonbury', 6),
('naugatuck', 6),
('north haven', 6),
('old lyme', 6),
('old saybrook', 6),
('somers', 6),
('stonington', 6),
('winsted', 6),
('elsmere', 6),
('rehoboth beach', 6),
('townsend', 6),
('dunnellon', 6),
('indialantic', 6),
('indian rocks beach', 6),
('lake placid', 6),
('lutz', 6),
('middleburg', 6),
('palm beach', 6),
('ponte vedra beach', 6),
('shalimar', 6),
('silver springs', 6),
('tavares', 6),
('valparaiso', 6),
('warrington', 6),
('bainbridge', 6),
('calhoun', 6),
('grovetown', 6),
('martin', 6),
('pembroke', 6),
('powder springs', 6),
('haleiwa', 6),
('lihue', 6),
('wahiawa', 6),
('clarion', 6),
('de soto', 6),
('keokuk', 6),
('preston', 6),
('burley', 6),
('beecher', 6),
('berwyn', 6),
('blue island', 6),
('carpentersville', 6),
('caseyville', 6),
('durand', 6),
('jerseyville', 6),
('lisle', 6),
('machesney park', 6),
('matteson', 6),
('mclean', 6),
('potomac', 6),
('ridgway', 6),
('ringwood', 6),
('rushville', 6),
('sandwich', 6),
('shorewood', 6),
('sugar grove', 6),
('sullivan', 6),
('summit', 6),
('taylorville', 6),
('vernon hills', 6),
('wauconda', 6),
('worth', 6),
('chesterton', 6),
('coatesville', 6),
('kendallville', 6),
('merrillville', 6),
('pittsboro', 6),
('rising sun', 6),
('chanute', 6),
('colby', 6),
('hays', 6),
('latham', 6),
('pratt', 6),
('wellsville', 6),
('berea', 6),
('hazard', 6),
('abbeville', 6),
('gretna', 6),
('minden', 6),
('sulphur', 6),
('west monroe', 6),
('youngsville', 6),
('billerica', 6),
('braintree', 6),
('chelmsford', 6),
('foxboro', 6),
('heath', 6),
('hyannis', 6),
('leominster', 6),
('raynham', 6),
('stow', 6),
('whitman', 6),
('yarmouth', 6),
('mt. airy', 6),
('bar harbor', 6),
('old orchard beach', 6),
('saco', 6),
('skowhegan', 6),
('turner', 6),
('algonac', 6),
('forest grove', 6),
('hudsonville', 6),
('iron mountain', 6),
('lake orion', 6),
('mayville', 6),
('petoskey', 6),
('saline', 6),
('fergus falls', 6),
('jordan', 6),
('owatonna', 6),
('victoria', 6),
('white bear lake', 6),
('ballwin', 6),
('bridgeton', 6),
('center', 6),
('festus', 6),
('hazelwood', 6),
('millersville', 6),
('passaic', 6),
('sikeston', 6),
('natchez', 6),
...]
lagunacity = df.loc[df['Location.City'] == "laguna beach"]
lagunashape=lagunacity['Data.Shape']
lagunacommonshapes=nltk.FreqDist(lagunashape).most_common(15)
glendora = df.loc[df['Location.City'] == "glendora"]
glendorashape=glendora['Data.Shape']
glendoracommonshapes=nltk.FreqDist(glendorashape).most_common(15)
michigan = df.loc[df['Location.City'] == "michigan city"]
michiganshape=michigan['Data.Shape']
michigancommonshapes=nltk.FreqDist(michiganshape).most_common(15)
niagara = df.loc[df['Location.City'] == "niagara falls"]
niagarashape=niagara['Data.Shape']
niagaracommonshapes=nltk.FreqDist(niagarashape).most_common(15)
fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4, ncols=1,gridspec_kw={'hspace': 0.5},figsize=(9, 12))
for shape, count in lagunacommonshapes:
ax1.barh(shape, count)
ax1.set_title('Common Shapes in Laguna')
ax1.set_xlabel('Frequency')
ax1.set_ylabel('Shape')
for shape, count in glendoracommonshapes:
ax2.barh(shape, count)
ax2.set_title('Common Shapes in Glendora')
ax2.set_xlabel('Frequency')
ax2.set_ylabel('Shape')
for shape, count in michigancommonshapes:
ax3.barh(shape, count)
ax3.set_title('Common Shapes in Michigan City')
ax3.set_xlabel('Frequency')
ax3.set_ylabel('Shape')
for shape, count in niagaracommonshapes:
ax4.barh(shape, count)
ax4.set_title('Common Shapes in Niagara Falls')
ax4.set_xlabel('Frequency')
ax4.set_ylabel('Shape')
Text(0, 0.5, 'Shape')

Fourth Visualization: The fourth visualization looks at specific cities with low amounts of UFO sightings to see what their most common UFO sightings shapes were. In this figure, we look at Laguna Beach and Glendora. For Laguna Beach, we can see that the most common shape is a “circle” which is 3 times more than Laguna Beach’s other most common shape sightings. For Glendora, we can see that the most common shape is tied between the “light”, “disk”, and “circle” which is only about 2 times more than Glendora’s most common shapes. Michigan has the “sphere” shape as the most common UFO sighting which is 2 times more than other shape sightings. Finally, Niagara Falls is tied with “formation” and “circle” being the most common shapes which is twice as much as the other most common shape sightings.
# change in frequency of each UFO shape over time
filtered_df = df.loc[(df['Dates.Sighted.Year'] >= 1997) & (df['Dates.Sighted.Year'] <= 2014)]
grouped_data = filtered_df.groupby(['Dates.Sighted.Year', 'Data.Shape']).size().unstack()
grouped_data.plot.bar(stacked=True, colormap='tab20')
plt.legend(title='Shape',bbox_to_anchor=(1.5,1))
plt.title('Shape Frequencies over the Years')Text(0.5, 1.0, 'Shape Frequencies over the Years')

Discussion
The big ideas that my analysis has shown is that UFO sightings have increased over the years, have mostly been sightings of light shapes, and are most common in large urban cities. It also seems that the cities with the greatest number of UFO sightings also have ‘light’ as their most common shapes whereas cities with the least number of UFO sightings have various other shapes besides ‘light’ as their most common shapes.
I noticed that the top cities with the greatest number of UFO sightings such as Las Vegas, Seattle, Los Angeles, and Phoenix are all large urban cities with a lot of infrastructure such as skyscrapers and a lively city at night compared to cities that mostly consist of the suburbs. This could be an explanation for why